Spark: a navigational paradigm for genomic data exploration.

نویسندگان

  • Cydney B Nielsen
  • Hamid Younesy
  • Henriette O'Geen
  • Xiaoqin Xu
  • Andrew R Jackson
  • Aleksandar Milosavljevic
  • Ting Wang
  • Joseph F Costello
  • Martin Hirst
  • Peggy J Farnham
  • Steven J M Jones
چکیده

Biologists possess the detailed knowledge critical for extracting biological insight from genome-wide data resources, and yet they are increasingly faced with nontrivial computational analysis challenges posed by genome-scale methodologies. To lower this computational barrier, particularly in the early data exploration phases, we have developed an interactive pattern discovery and visualization approach, Spark, designed with epigenomic data in mind. Here we demonstrate Spark's ability to reveal both known and novel epigenetic signatures, including a previously unappreciated binding association between the YY1 transcription factor and the corepressor CTBP2 in human embryonic stem cells.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Surfing the city: An architecture for context aware urban exploration

Web surfing, the act of following links of interest with no pre-defined search goal, is a paradigm that can be translated to the physical realm of urban exploration. With mobile computing technology and its supporting infrastructure becoming ever more ubiquitous, a user's digital device can be transformed into a portal that connects their physical environment with the virtual, providing instant...

متن کامل

Creating a Portable, High-Level Graph Analytics Paradigm For Compute and Data-Intensive Applications

HPC offers tremendous potential to process large amount of data commonly referred to as ‘Big Data’. Due to the immense computational requirements of Big Data applications, the HPC and Big Data communities are converging. As a result, heterogeneous and distributed systems are becoming commonplace. In order to take advantage of the immense computing power of these systems, distributing data effic...

متن کامل

VariantSpark: Applying Spark-based machine learning methods to genomic information

Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, a framework for applying machine learning algorithms in MLlib to genomic variant data using the efficient in-memory Spark compute engine. We demonstr...

متن کامل

Immersive graph-based visualization and exploration of biological data relationships

Genomic information shows some characteristics that make them very difficult to interpret and to exploit. Such data constitute an important factual resource (GenBank, SwissProt, GeneOntology, or Decrypthon...), are heterogeneous, huge in quantity, and are geographically distributed. They are also recorded in structured or semi-structured formats within public or private databanks. Nevertheless,...

متن کامل

hMDAP: A Hybrid Framework for Multi-paradigm Data Analytical Processing on Spark

We propose hMDAP, a hybrid framework for large-scale data analytical processing on Spark, to support multi-paradigm process (incl. OLAP, machine learning, and graph analysis etc.) in distributed environments. The framework features a three-layer data process module and a business process module which controls the former. We will demonstrate the strength of hMDAP by using traffic scenarios in a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 22 11  شماره 

صفحات  -

تاریخ انتشار 2012